Lucian Codrescu Sr. Director, Technology Qualcomm Technologies, Inc.

Qualcomm Hexagon DSP: An architecture optimized for mobile multimedia and communications





Qualcomm Technologies, Inc. All Rights Reserved



### Expansion of Hexagon DSP use cases beyond audio



## Hexagon DSP is evolving for use beyond voice and audio to computer vision, video and imaging features



### The Hexagon DSP evolution

Generational improvements in performance and power efficiency driven by both architecture and implementation





### Key characteristics of modem & multimedia applications

### Requirements

- Require fixed real-time performance level (fps, Mbit/sec, etc.)
- Extremely aggressive power & area targets

### **Characteristics**

- Mix of signal processing & control code
  - For modem, Qualcomm does not use a split CPU/DSP architecture. All processing is done on Hexagon DSP
  - Multimedia apps have significant control in the RTOS & frameworks
- Heavy L2\$ misses
  - Multimedia is data intensive
  - Modem is code intensive

## E X A G O N<sup>™</sup>

# Hexagon DSP blends features targeted to modem & multimedia

### VLIW

- Need multi-issue to meet performance
- Low complexity for Area & Power

### Multi-Threading

- To reduce L2\$ miss penalty without the need for a large L2
- Increases
   instructions/VLIW packet
   because compiler doesn't
   need to schedule latency

Hexagon DSP

# Innovate in ISA to maximize IPC

- More work/VLIW packet reduces energy/instruction
- Keep the pipelines full for MIPS/mm2
- Target both Signal Processing & Control code



### VLIW: Area & power efficient multi-issue



E X A G O N<sup>™</sup>

Maximizing the signal processing code work/packet Example from inner loop of FFT: Executing 29 "simple RISC ops" in 1 cycle



#### EXAGON Maximizing the control code work/packet Hexagon DSP ISA improves control code efficiency Example C code over traditional VLIW void example(int \*ptr, int val) { if (ptr!=0) { \*ptr = \*ptr + val + 2;}} **Hexagon DSP: Hexagon DSP:** Hexagon DSP: Tradional VLIW **Dot-New Predication New-Value Store Assembly Code** Compound ALU p0 = cmp.eq(r0,#0)p0 = cmp.eq(r0,#0)1 p0 = cmp.eq(r0,#0)p0 = cmp.eq (r0, #0)(1)if (!p0.new) r2=memw(r0) if (!p0.new) r2=memw(r0) if (!p0.new) r2=memw(r0) 1 1 if (p0.new) jumpr:nt r31 if (p0.new) jumpr:nt r31 if (p0.new) jumpr:nt r31 if (!p0) r2=memw(r0) 2 if (p0) jumpr:nt r31 2 2 r2 = add(r2, #2)r1 = add(r1, add(r2, #2))3 r1 = add(r1, add(r2, #2))3 r1 = add(r1,r2)r2 = add(r2, #2)2 memw(r0) = r1.newmemw(r0) = r1r1 = add(r1,r2)3 jumpr r31 jumpr r31 memw(r0) = r14 jumpr r31 memw(r0) = r15 jumpr r31

Instr/Packet = 7 instr/5 packets = 1.4

Qualcomm Technologies, Inc. All Rights Reserved

Instr/Packet =

7 instr/2packets = 3.5



### High avg. instructions/packet for targeted use cases Compound instructions count as 2





# Programmer's view of Hexagon DSP HW multi-threading

- Hexagon V5 includes three hardware threads
- Architected to look like a multi-core with communication through shared memory





### Hexagon DSP V1-V4: Interleaved multi-threading

#### Simple round-robin thread scheduling

- Number of threads match execution pipe depth (three threads → three execute stages)
- All instructions complete before next packet dispatch
- Compiler schedules for zero-latency which helps to increase instructions/VLIW packet



## E X A G O N<sup>™</sup>

### Hexagon DSP V5: Dynamic HW multi-threading

#### Recover some performance when threads idle or stalled

- Remove a thread from IMT rotation
  - On L2 cache misses
  - When in wait-for-interrupt or off mode
- Additional forwarding to support 2-cycle packets
- VLIW packets with dependencies between long latency instructions will stall
  - But many VLIW packets with simple instructions can complete in 2 processor clocks





### Hexagon DSP instructions per cycle





### Hexagon DSP V5: Efficient Architecture

Highly efficient mobile application processor — designed for more performance per MHz



Source: BDTI - For more detailed information see www.BDTI.com. All scores ©2013 BDTI

\* - Projected best case score for 3-threads

## Hexagon DSP Power Benefits





### MP3 playback power for competitive smartphones



- Power measured at the battery for various phones
- Includes everything: DSP, CPU, memory, analog components, etc

Source: Qualcomm internal measurements



### Computer vision offload – ARM/neon to Hexagon DSP



Source: Qualcomm internal measurements. \* Power measured at the device battery

## H E X A G O N<sup>™</sup>

### Hexagon DSP power for different thread utilizations

- Excellent near-linear power scalability (as threads go idle, power used by the thread is nearly eliminated)
- Achieved through optimized clock tree design & clock gating



**Dhrystone Power**,



**FIR Power**,



# Hexagon DSP Software Development





### Announcing the Hexagon DSP SDK

Code

Tool

See the Hexagon DSP SDK in action at Uplinq2013 (www.uplinq.com)



Hexagon DSP SDK

Eclipse based Integrated Development Environment

Visit http://developer.qualcomm.com for more information.

# Thank you

#### Follow us on: f 🕑

#### For more information on Qualcomm, visit us at: www.qualcomm.com & www.qualcomm.com/blog

©2013 Qualcomm Technologies, Inc.

Qualcomm and Hexagon are trademarks of QUALCOMM Incorporated, registered in the United States and other countries. All QUALCOMM Incorporated trademarks are used with permission. Other product and brand names may be trademarks or registered trademarks of their respective owners. Hexagon is a product of Qualcomm Technologies, Inc.

